System and method for visual analysis of on-image gestures

ABSTRACT

A method and system for providing at least a link to a content item related to a multimedia content element respective of an on-image gesture. The method comprises receiving, from a user device, at least on-image gesture and the multimedia content element; analyzing the at least on-image gesture determine at least one portion of the multimedia content element that a user is interested in; generating at least one signature for each of the at least a portion; determining a content item corresponding to the at least one identified portion of multimedia content, wherein the determination is based in part on a type of the at least on-image gesture; and modifying the received multimedia content element to include at least a link to an informative resource containing the content item.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 61/763,505 filed on Feb. 12, 2013, the contents of which are hereby incorporated by reference. This application is a continuation-in-part (CIP) of U.S. patent application Ser. No. 13/685,182 filed on Nov. 26, 2012, now pending, which is a CIP of:

(a) U.S. patent application Ser. No. 13/624,397 filed on Sep. 21, 2012, now pending;

(b) U.S. patent application Ser. No. 13/344,400 filed on Jan. 5, 2012, now pending, which is a continuation (CON) of U.S. patent spplication Ser. No. 12/434,221, filed May 1, 2009, now U.S. Pat. No. 8,112,376;

(c) U.S. patent application Ser. No. 12/084,150 having a filing date of Apr. 7, 2009, now allowed, which is the National Stage of International Application No. PCT/IL2006/001235, filed on Oct. 26, 2006, which claims foreign priority from Israeli Application No. 171577 filed on Oct. 26, 2005 and Israeli Application No. 173409 filed on 29 Jan. 2006; and,

(d) U.S. patent application Ser. No. 12/195,863, filed Aug. 21, 2008, now U.S. Pat. No. 8,326,775, which claims priority under 35 USC 119 from Israeli Application No. 185414, filed on Aug. 21, 2007, and which is also a continuation-in-part (CIP) of the above-referenced U.S. patent spplication Ser. No. 12/084,150.

All of the applications referenced above are herein incorporated by reference for all that they contain.

TECHNICAL FIELD

The present invention relates generally to the analysis of multimedia content, and more specifically to a system for providing content and links to content displayed as part of a web-page.

BACKGROUND

Web-pages are information resources that are suitable for the World Wide Web (WWW) and can be accessed through a web browser. Web-pages typically contain text and multimedia content elements that are intended for display on a user's display device. Multimedia content elements are generally displayed using portions of code written in, for example, hyper-text mark-up language (HTML) or JavaScript that is inserted into, or otherwise called up by documents also written in HTML and which are sent to a user node for display.

Multimedia content elements displayed in such web-pages are usually non-interactive, thereby allowing users to view the multimedia content elements, but not to connect with such multimedia content. At most, the user is enabled to leave some feedback regarding the multimedia content within the web-page. Therefore, if a user wishes to receive information regarding an item viewed in, for example, a video, further search efforts are required.

SUMMARY

Certain embodiments disclosed herein include a method and system for providing at least a link to a content item related to a multimedia content element respective of an on-image gesture. The method comprises receiving, from a user device, at least on-image gesture and the multimedia content element; analyzing the at least on-image gesture determine at least one portion of the multimedia content element that a user is interested in; generating at least one signature for each of the at least a portion; determining a content item corresponding to the at least one identified portion of multimedia content, wherein the determination is based in part on a type of the at least on-image gesture; and modifying the received multimedia content element to include at least a link to an informative resource containing the content item.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention will be apparent from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic block diagram of a network system utilized to describe the various embodiments.

FIG. 2 is a flowchart describing a process of matching an advertisement to multimedia content displayed on a web-page according to an embodiment.

FIG. 3 is a block diagram depicting the basic flow of information in the signature generator system.

FIG. 4 is a diagram showing the flow of patches generation, response vector generation, and signature generation in a large-scale speech-to-text system.

FIG. 5 is a flowchart describing a process for adding a link to multimedia content displayed on a web-page.

FIG. 6 is a flowchart describing a process for analyzing an on-image gesture received by a user according to an embodiment.

DETAILED DESCRIPTION

It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed inventions. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.

FIG. 1 shows an exemplary and non-limiting schematic diagram of a network system 100 utilized to describe the disclosed embodiments. A network 110 is used to communicate between different parts of the system. The network 110 may be the Internet, the world-wide-web (WWW), a local area network (LAN), a wide area network (WAN), a metro area network (MAN), and other networks capable of enabling communication between the elements of the system 100.

Further connected to the network 110 are one or more client applications, such as web browsers (WB) 120-1 through 120-n (collectively referred to hereinafter as web browsers 120 or individually as a web browser 120, merely for simplicity purposes). A web browser 120 is executed over a computing device including, for example, a personal computer (PC), a personal digital assistant (PDA), a mobile phone, a smart phone, a tablet computer, a wearable computing device, and other kinds of wired and mobile appliances, equipped with browsing, viewing, listening, filtering, and managing capabilities, etc., that are enabled as further discussed herein below. Each of the web-servers 120 may be implemented as an independent or plug-in application.

A server 130 is further connected to the network 110 and is configured to perform, in part, the embodiments disclosed herein. A request of the server 130 to analyze the multimedia content item can be sent by a script executed by a web-browser 120 in the web-page in response to the uploading of one or more multimedia content items to the web-page. Such a request may include a URL of the web-page or a copy of the web-page. The system 100 also includes a signature generator system (SGS) 140. In one embodiment, the SGS 140 is connected to the server 130. The server 130 is enabled to receive and serve multimedia content and causes the SGS 140 to generate a signature respective of the multimedia content. The process for generating the signatures for multimedia content is explained in more detail herein below with respect to FIGS. 3 and 4.

It should be noted that each of the server 130 and the SGS 140 typically comprises a processing unit (not shown) such as a processor, a CPU, and the like, that is coupled to a memory. The memory contains instructions that can be executed by the processing unit. The server 130 also includes an interface (not shown) to the network 110. In one embodiment the server 130 is communicatively connected or includes an array of Computational Cores configured as discussed in more detail below.

A plurality of web servers 150-1 through 150-m are also connected to the network 110, each of which is configured to generate and send multimedia content items to the server 130. The web servers 150-1 through 150-m typically, but not necessarily exclusively, are resources for information that can be associated with a multimedia content sent from a web browser 120. For example, a web server 150-1 may host the Wikipedia website.

The system 100 may be configured to generate customized channels of multimedia content. Accordingly, a web browser 120 or a client channel manager application (not shown), available on either the server 130, or the web browser 120, or as an independent or plug-in application, may enable a user to create customized channels of multimedia content by receiving selections made by a user as inputs. Such customized channels of multimedia content are personalized content channels that are generated in response to selections made by a user of the web browser 120 or the client channel manager application. The system 100, and in particular the server 130 in conjunction with the SGS 140, determines which multimedia content is more suitable to be viewed, played, or otherwise utilized by the user with respect to a given channel, based on the signatures of selected multimedia content. These channels may optionally be shared with other users, used and/or further developed cooperatively, and/or sold to other users or providers, and so on. The process for defining, generating, and customizing the channels of multimedia content are described in greater detail in the co-pending Ser. No. 13/344,400 application referenced above.

According to the embodiments disclosed herein, the server 130 is configured to carry out a process for providing a content item or a link thereto to an information resource associated with an input multimedia content element respective of on-image gesture, event, or combination thereof. The on-image gesture and/or event are received from a web-browser 120. In response, the server 130 returns a modified web page including the multimedia content element with the determined content item or linked thereto.

The on-image gesture or combination of gestures may include, but are not limited to: one or more touch gestures, one or more scrolls over the at least a portion of the multimedia content element, one or more clicks over the at least a portion of the multimedia content, one or more responses to the at least a portion of the multimedia content, a combination thereof, a portion thereof, and so. The touch gestures may be related to computing devices with a touch screen display and such gestures include, but are not limited to, tapping on a content element, resizing a content element, swiping over a content element, changing the display orientation, and so on. In an embodiment, gestures detected by the web-browser can be sent in combination with one or more events. The event or combination of events may include, but are not limited to, a predetermined period of time in which a user views or interacts with the multimedia content element.

The server 130 is further configured to analyze the received on-image gestures and/or events to determine at least one portion of the received multimedia content element that is of particular interest to the user. Then, the server 130 by means of the SGS 140 is configured to generate a signature for each identified portion. Using the generated signatures and the type of the received on-image gesture and/or event, a search for content items relevant to the identified portion is performed. Thereafter, relevant content items, or links thereto, can be added as an overlay to the received multimedia content element displayed on a web-page.

A multimedia content element and content item may include, for example, an image, a graphic, a video stream, a video clip, an audio stream, an audio clip, a video frame, a photograph, and an image of signals (e.g., spectrograms, phasograms, scalograms, etc.), and/or combinations thereof and portions thereof.

It should be noted that the server 130 may analyze all or a sub-set of the multimedia content elements contained in the web-page. The SGS 140 generates at least one signature for portions of each multimedia content element provided by the server 130. The generated signature(s) may be robust to noise and distribution as discussed below. Then, using the generated signature(s), the server 130 is capable of matching the signature of a web-page accessible by a link to the multimedia content and providing the matched link. Such links may be extracted from the data warehouse 160. For example, if the signature of an image indicates the city of New York, then a link to the municipal website of the city of New York may be determined.

For instance, in order to provide a matching content item for a sports car it may be desirable to locate a car of a particular model. However, in most cases, the model of the car would not be part of the metadata associated with the multimedia content (image). Moreover, the car shown in an image may be displayed at an angle that differs from the angle of a specific photograph of the car that is available for use as a search item. The signature generated for that image would enable accurate recognition of the model of the car because the signatures generated for the multimedia content elements, according to the disclosed embodiments, allow for recognition and classification of multimedia elements, such as by content-tracking, video filtering, multimedia taxonomy generation, video fingerprinting, speech-to-text, audio classification, element recognition, video/image search and any other application requiring content-based signatures generation and matching for large content volumes such as web and other large-scale databases.

In one embodiment, the signatures generated for more than one multimedia content element are clustered. The clustered signatures are used to search for matching content items and to select one or more of the matching content items. The one or more selected matching content items are retrieved from the data warehouse 160 and uploaded to the web-page on the web browser 120 by means of one of the web servers 150.

FIG. 2 depicts an exemplary and non-limiting flowchart 200 describing the process of matching an advertisement to a multimedia content element displayed on a web-page. In S205, the method starts when a web-page is uploaded to one of the web-browsers (e.g., web-browser 120-1). In S210, a request to match at least one multimedia content element contained in the uploaded web-page to an appropriate content item is received. The request can be received from a web server (e.g., a server 150-1), a script running on the uploaded web-page, or an agent (e.g., an add-on) installed in the web-browser. S210 can also include extracting the multimedia content elements and requesting that respective signatures be generated.

In S220, a signature of the multimedia content element is generated. The signature for the multimedia content element generated by a signature generator is described below. In S230, an advertisement item is matched to the multimedia content element respective of its generated signature. In one embodiment, the matching process includes searching for at least one advertisement item with a matching signature respective of the signature of the multimedia content and displaying the at least one advertisement item within the display area of the web-page. In one embodiment, the matching of an advertisement to a multimedia content element can be performed by the computational cores that are part of a large scale matching discussed in detail below.

In S240, upon a user's gesture, the matched advertisement item is uploaded to the web-page and displayed therein. The user's gesture may be: a scroll on the multimedia content element; a tap on the multimedia content element, and/or a response to the multimedia content. This ensures that the user attention is given to the content item by providing the advertised content only when the user has become interested in the multimedia content element. In S250 it is checked whether there are additional requests to analyze multimedia content elements and, if so, execution continues with S210; otherwise, execution terminates.

As a non-limiting example, a user uploads a web-page that contains an image of a sea shore. The image is then analyzed and a signature is generated respective thereto. Respective of the image signature, an advertisement item (e.g., a banner) is matched to the image, for example, a swimsuit advertisement. Upon detection of a user's gesture, for example, a mouse scrolling over the sea shore image, the swimsuit ad is displayed.

The web-page may contain a number of multimedia content elements; however, in some instances only a few advertisement items may be displayed in the web-page. Accordingly, in one embodiment, the signatures generated for the multimedia content elements are clustered and the cluster of signatures is matched to one or more advertisement items.

FIGS. 3 and 4 illustrate the generation of signatures for the multimedia content elements by the SGS 140 according to one embodiment. An exemplary high-level description of the process for large scale matching is depicted in FIG. 3. In this example, the matching is for a video content.

Video content segments 2 from a Master database (DB) 6 and a Target DB 1 are processed in parallel by a large number of independent computational Cores 3 that constitute an architecture for generating the Signatures (hereinafter the “Architecture”). Further details on the computational Cores generation are provided below. The independent Cores 3 generate a database of Robust Signatures and Signatures 4 for Target content-segments 5 and a database of Robust Signatures and Signatures 7 for Master content-segments 8. An exemplary and non-limiting process of signature generation for an audio component is shown in detail in FIG. 4. Finally, Target Robust Signatures and/or Signatures are effectively matched, by a matching algorithm 9, to Master Robust Signatures and/or Signatures database to find all matches between the two databases.

To demonstrate an example of signature generation process, it is assumed, merely for the sake of simplicity and without limitation on the generality of the disclosed embodiments, that the signatures are based on a single frame, leading to certain simplification of the computational cores generation. The Matching System is extensible for signatures generation capturing the dynamics in-between the frames.

The Signatures' generation process will now be described with reference to FIG. 4. The first step in the process of signatures generation from a given speech-segment is to breakdown the speech-segment to K patches 14 of random length P and random position within the speech segment 12. The breakdown is performed by the patch generator component 21. The value of the number of patches K, random length P and random position parameters is determined based on optimization, considering the tradeoff between accuracy rate and the number of fast matches required in the flow process of the server 130 and SGS 140. Thereafter, all the K patches are injected in parallel into all computational Cores 3 to generate K response vectors 22, which are fed into a signature generator system 23 to produce a database of Robust Signatures and Signatures 4.

In order to generate Robust Signatures, i.e., Signatures that are robust to additive noise L (where L is an integer equal to or greater than 1) by the Computational Cores 3 a frame ‘i’ is injected into all the Cores 3. Then, Cores 3 generate two binary response vectors: {right arrow over (S)} which is a Signature vector, and {right arrow over (RS)} which is a Robust Signature vector.

For generation of signatures robust to additive noise, such as White-Gaussian-Noise, scratch, etc., but not robust to distortions, such as crop, shift and rotation, etc., a core C_(i)={n_(i)} (1≦i≦L) may consist of a single leaky integrate-to-threshold unit (LTU) node or more nodes. The node n_(i) equations are:

$V_{i} = {\sum\limits_{j}\; {w_{ij}k_{j}}}$ n_(i) = •(Vi − Th_(x))

where, □ is a Heaviside step function; w_(ij) is a coupling node unit (CNU) between node i and image component j (for example, grayscale value of a certain pixel j); k_(j) is an image component ‘j’ (for example, grayscale value of a certain pixel j); Thx is a constant Threshold value, where x is ‘S’ for Signature and ‘RS’ for Robust Signature; and Vi is a Coupling Node Value.

The Threshold values Thx are set differently for Signature generation and for Robust Signature generation. For example, for a certain distribution of Vi values (for the set of nodes), the thresholds for Signature (Th_(S)) and Robust Signature (Th_(RS)) are set apart, after optimization, according to at least one or more of the following criteria:

1: For: V_(i)>Th_(RS)

1−p(V>Th _(S))−1−(1−ε)^(l)<<1

i.e., given that/nodes (cores) constitute a Robust Signature of a certain image I, the probability that not all of these I nodes will belong to the Signature of same, but noisy image, − is sufficiently low (according to a system's specified accuracy).

2: p(V_(i)>Th_(RS))≈l/L

i.e., approximately l out of the total L nodes can be found to generate a Robust Signature according to the above definition.

3: Both Robust Signature and Signature are generated for certain frame i.

It should be understood that the generation of a signature is unidirectional, and typically yields lossless compression, where the characteristics of the compressed data are maintained but the uncompressed data cannot be reconstructed. Therefore, a signature can be used for the purpose of comparison to another signature without the need of comparison to the original data. Detailed description of the Signature generation can be found U.S. Pat. Nos. 8,326,775 and 8,312,031, assigned to common assignee, which are hereby incorporated by reference for all the useful information they contain.

A Computational Core generation is a process of definition, selection, and tuning of the parameters of the cores for a certain realization in a specific system and application. The process is based on several design considerations, such as:

(a) The Cores should be designed so as to obtain maximal independence, i.e., the projection from a signal space should generate a maximal pair-wise distance between any two cores' projections into a high-dimensional space.

(b) The Cores should be optimally designed for the type of signals, i.e., the Cores should be maximally sensitive to the spatio-temporal structure of the injected signal, for example, and in particular, sensitive to local correlations in time and space. Thus, in some cases a core represents a dynamic system, such as in state space, phase space, edge of chaos, etc., which is uniquely used herein to exploit their maximal computational power.

(c) The Cores should be optimally designed with regard to invariance to a set of signal distortions, of interest in relevant applications. Detailed description of the Computational Core generation, the computational architecture, and the process for configuring such cores is discussed in more detail in the co-pending U.S. patent application Ser. No. 12/084,150 referenced above.

FIG. 5 depicts an exemplary and non-limiting flowchart 500 describing the process of adding an overlay to multimedia content displayed on a web-page. In S510, the method starts when a web-page is uploaded to a web-browser (e.g., web-browser 120-1). In another embodiment, the method starts when a web-server (e.g., web-server 150-1) receives a request to host the requested web-page. In S515, the server 130 receives the uniform resource locator (URL) of the uploaded web-page. In another embodiment, the uploaded web-page includes an embedded script. The script extracts the URL of the web-page, and sends the URL to the server 130. In another embodiment, an add-on installed in the web-browser 120 extracts the URL of the uploaded web-page, and sends the URL to the server 130. In yet another embodiment, an agent is installed on a user device executing the web browser 120. The agent is configured to monitor web-pages uploaded to the web-site to determine when web-pages have been uploaded to the web-site, extract the URLs, and send the URLs to the server 130. In another embodiment, a web-server (e.g., server 150) hosting the requested web-page, provides the server 130 with the URL of the requested web-page. It should be noted only URLs of selected web sites can be sent to the server 130, for example, URLs related to web-sites that paid for the additional information.

In S520, the server downloads the web-page respective of each received URL. In S525, the server 130 analyzes the web-page in order to identify the existence of at least one or more multimedia content elements in the uploaded web-page. It should be understood that a multimedia content, such as an image or a video, may include a plurality of multimedia content elements. In S530, the SGS 140 generates at least one signature for each multimedia content element identified by the server 130. The signatures for the multimedia elements are generated as described in greater detail above.

In S540, respective of each signature, the server 130 determines one or more links to content that exists on a web server, for example, each of the web servers 150-1 through 150-m that can be associated with the multimedia element. A link may be a hyperlink, a URL, and the like. The content accessed through the link may be, for example, informative web-pages such as a Wikipedia® article. The determination of the link may be made by identification of the context of the signatures generated by the server 130. For example, if a multimedia content element was identified as a football player, a signature is generated respective thereto, and a link to a sport website that contains information about the football player is determined. In S550, the determined link to the content is added as an overlay to the web-page by the server 130, respective of the corresponding multimedia content element. According to one embodiment, a link that contains the overlay may be provided to a web browser respective of a user's gesture. A user's gesture may be, for example, a click on the multimedia content element through, for example, a computer mouse, a touch pad, or a touch screen; and/or a response to the multimedia content (e.g., movement detected by a motion sensor, noise detected by a microphone, etc.).

The modified web-page that includes at least one multimedia element with the added link can be sent directly to the web browser (e.g., browser 120-1) requesting the web-page. This requires establishing a data session between the server 130 and the web browsers 120. In another embodiment, the multimedia element including the added link is returned to a web server (e.g., server 150-1) hosting the requested web-page. The web server (e.g., server 150-1) subsequently returns the requested web-page with the multimedia element containing the added link to the web browser (e.g., browser 120-1) requesting the web-page. Once the “modified” web page is displayed over the web browser, a detected event or user's gesture with respect to the multimedia content element would cause the browser to upload the content (e.g., a Wikipedia® article web page) addressed by the link added to the multimedia element.

In S560, it is checked whether the one or more multimedia content elements contained in the web-page has changed, and if so, execution continues with S525; otherwise, execution terminates.

Different portions of the multimedia content element may be associated with different server content or links to server content. As a non-limiting example, a web-page related to cinema is uploaded and an image of the movie “Pretty Woman” showing actor Richard Gere and actress Julia Roberts is identified within the web-page by the server 130. A signature is generated by the SGS 140 respective of the actor Richard Gere and the actress Julia Roberts, both shown as portions of the image. A link to Richard Gere's biography on the Wikipedia® website and a link to Julia Roberts' biography on the Wikipedia® website are then determined respective of the signatures and the context of the signatures as further described herein above. The context of the signatures according to this example may be “American Movie Actors.”

An overlay containing the links to Richard Gere's biography on the Wikipedia® website and Julia Roberts' biography on the Wikipedia® website is added over the image such that upon detection of a specified event or a user's gesture, for example, a gesture wherein a mouse clicks on the part of the image where Richard Gere is shown, the link to Richard Gere's biography on Wikipedia® is provided to the user.

According to another embodiment, a request for a URL of a web-page that contains an embedded video clip is received. The video content within the requested web-page is analyzed and a signature is generated respective of the entertainer Madonna that is shown in the video content. A link to Madonna's official web-page hosted on a web-server 150-n is then determined respective of the signature as further described herein above. An overlay containing the link to Madonna's official web-page is then added over the video content. The web-page together with the link to Madonna's official web-page is then sent to the web server 150-1. Then, the requested web-page with the modified video element is uploaded to the web-browser 120-1.

The web-page may contain a number of multimedia content elements; however, in some instances only a few links may be displayed in the web-page. Accordingly, in one embodiment, the signatures generated for the multimedia content elements are clustered and the cluster of signatures is matched to one or more content items.

FIG. 6 depicts an exemplary and non-limiting flowchart 600 describing a method of analyzing an on-image gesture received by a user device and providing a content item respective thereof according to an embodiment. The method can be performed by the server 130 using the SGS 140.

In S610, the method starts when at least a portion of a multimedia content element from a web-page as well as at least one gesture, event, or combination thereof, is received. The on-image gestures and the multimedia content are captured and sent by a web-browser (e.g., WB 120-1) executed over a user device. In an embodiment, a URL of the web-page and an identifier of the multimedia content associated with the detected gesture and/or event is provided. On-image gestures may include, but are not limited to: one or more touch gestures, one or more scrolls over the at least a portion of the multimedia content element, one or more clicks over the at least a portion of the multimedia content, one or more responses to the at least a portion of the multimedia content, a combination thereof, a portion thereof, and so on. The touch gestures may be related to computing devices with a touch screen display and such gestures may include, but are not limited to, tapping on a content element, resizing a content element, swiping over a content element, changing the display orientation, and so on. In an embodiment gestures detected by the web-browser can be sent together with one or more events. Alternatively, the web-browser 120 can send only events related to the interaction of a user with the content element. Events may include, but are not limited to, a predetermined period of time in which a user views or interacts with the multimedia content element.

In S620, the received gestures and/or events are analyzed to determine at least one portion of the received multimedia content element that is of particular interest to the user. As a non-limiting example, if a multimedia content element is an image featuring a man and a boat, and the user zooms in on the boat (an event of expanding a part of a screen that demonstrates an interest in the particular portion of the image that is expanded), the boat is determined to be the portion of the multimedia content element that is of particular interest to the user.

In S630, at least one signature is generated for each portion of the multimedia content element identified in S620. The signatures for the multimedia content elements are generated as described in greater detail above.

In S640, respective of the at least a signature of at each portion of the multimedia content element, the received on-image gestures and/or events corresponding to the at least a portion of the multimedia content element are determined. Each different gesture, event, set of gestures, set of events, and combinations thereof, received from a user can be differentiated and associated with different links or content from a server. As an example, a click on the at least a portion of the multimedia content may be determined as a first gesture associated with, e.g., a link to a Wikipedia® article, and a double click on the at least a portion of the multimedia content may be determined as a different gesture associated with push data being delivered to the user. In an embodiment, a preconfigured table providing a mapping between a type of gesture, event, and a combination of gesture and event to the type of content item and its delivery method is saved in the data warehouse 160 and is accessible by the server 130. Furthermore, one of ordinary skill should appreciate that an on-image gesture can be a graphic, a video stream, a video clip, an audio stream, an audio clip, a video frame, and a photograph.

In S650, respective of each signature for the portion of the multimedia content element and corresponding gestures and/or events, a search is performed for content items that can be associated with the multimedia element respective of the gestures and/or events. This determination may be performed by matching signatures generated for the portion of the multimedia content element with potential content items. The search for such content items is performed using a data warehouse 160 by the web servers 150. A content item is determined to be related to the portion of multimedia content element when their respective signatures (as generated by the SGS 140) match. The signature matching process is described in more detail above. In an exemplary embodiment, when two signatures overlap more than a predetermined threshold level, for example 60% of the signature match, these signatures may be considered as matching.

In an embodiment, the search for relevant content items is not limited to the data warehouse. The search can be performed using signatures generated by the SGS 140 and the identified context in data sources that index searchable content including, but not limited to, multimedia content items using signatures and concepts. A context is determined as the correlation between a plurality of concepts. An example for such indexing techniques using signatures is disclosed in a co-pending U.S. patent application Ser. No. 13/766,463, filed Feb. 13, 2013, entitled “A SYSTEM AND METHODS FOR GENERATION OF A CONCEPT BASED DATABASE”, assigned to common assignee, and is hereby incorporated by reference for all the useful information it contains.

In one embodiment, the signatures generated for more than one unstructured data element are clustered. The clustered signatures are used to search for a common concept. The concept is a collection of signatures representing elements of the unstructured data and metadata describing the concept. As a non-limiting example, a ‘Superman concept’ is a signature reduced cluster of signatures describing elements (such as multimedia elements) related to, e.g., a Superman cartoon: a set of metadata representing proving textual representation of the Superman concept. Techniques for generating concepts and concept structures are also described in the co-pending U.S. patent application Ser. No. 12/603,123 (hereinafter the '123 Application) to Raichelgauz et al., which is assigned to common assignee, and is incorporated hereby by reference for all that it contains.

In S660, the determined related content or a link to the determined content is added as an overlay to the web-page respective of the corresponding multimedia content element and the corresponding gestures and/or events. According to one embodiment (not shown), a vocabulary of the determined gestures and/or events may be provided as part of the overlay. Such vocabulary may include, but is not limited to, one or more gestures and/or events, and a description of the corresponding server content or links to server content that will be provided upon occurrence of the one or more gestures and/or events.

In an embodiment, the modified web-page that includes at least one multimedia element with the added link can be sent directly to the web browser (e.g., browser 120-1) requesting the web-page. This requires establishing a data session between the server 130 and the web browsers 120. In another embodiment, the multimedia element including the added link is returned to a web server (e.g., server 150-1) hosting the requested web-page. The web server (e.g., server 150-1) returns the requested web-page with the multimedia element containing the added link to the web browser (e.g., browser 120-1) requesting the web-page. Once the “modified” web page is displayed over the web browser, a detected user's gesture and/or the occurrence of an event over the multimedia element would cause the browser to upload the content addressed by the link added to the multimedia element.

In S670, it is checked whether one or more gestures and/or events have occurred and, if so, execution continues with S610; otherwise, execution terminates.

As another non-limiting example, a touch gesture associated with a question mark as per the vocabulary may provide an informative link, and a touch gesture associated with an exclamation mark as per the vocabulary may provide a link in which the user will be able to respond to the image by, e.g., leaving a written comment regarding the image.

As a further non-limiting example, the multimedia content element may be a video clip of a music video of a particular song. Additionally, the video clip may have content item related to purchasing the song, and the link to this server content may be related to the combination of the event that a user views the video for at least 30 seconds and the gesture of swiping on a touch screen. If a user proceeds to view the clip for one minute then swipes the touch screen, the user will be provided with a link to a website that would allow the user to purchase the song.

The various embodiments disclosed herein can be implemented as hardware, firmware, software, or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. 

What is claimed is:
 1. A method for providing at least a link to a content item related to a multimedia content element respective of an on-image gesture, comprising: receiving, from a user device, at least the on-image gesture and the multimedia content element; analyzing the at least on-image gesture to determine at least one portion of the multimedia content element in which a user is interested; generating at least one signature for each of the at least a portion; determining a content item corresponding to the at least one identified portion of multimedia content, wherein the determination is based in part on a type of the at least on-image gesture; and modifying the received multimedia content element to include at least a link to an informative resource containing the content item.
 2. The method of claim 1, further comprising: receiving at least one event related to the received multimedia content element; and determining the content item corresponding to the at least one identified portion of multimedia content using the generated signatures, the at least one event, and the on-image gesture.
 3. The method of claim 1, wherein an on-gesture is any one of: a touch gesture, a scroll-over, a mouse click, wherein the touch gesture is detected on a user device having a touch screen display.
 4. The method of claim 1, wherein the event is at least one of viewing the multimedia content element for a specified period of time and interacting with the multimedia content element for a specified period of time.
 5. The method of claim 2, further comprising: determining a type of the on-image gesture and a type of the at least one event; and determining a type of the content item based on at least one of: the type of the on-image gesture and the type of the at least one event.
 6. The method of claim 1, further comprising: determining the context of the multimedia content element respective of the generated signature; and determining the content item based on the context of the multimedia content element respective of the generated signature.
 7. The method of claim 1, wherein any one of the multimedia content element and the content item is at least one of: an image, graphics, a video stream, a video clip, an audio stream, an audio clip, a video frame, a photograph, images of signals, combinations thereof, and portions thereof.
 8. The method of claim 1, wherein the at least link is added to the multimedia content element as an overlay object, wherein the modified multimedia content is embedded in a web page displayed in a web-browser of the user device.
 9. The method of claim 8, wherein the overlay object comprises a vocabulary of at least one on-gesture determined as corresponding to the at least identified portion of the received multimedia content element.
 10. A non-transitory computer readable medium having stored thereon instructions for causing one or more processing units to execute the method according to claim
 1. 11. A system for providing at least a link to a content item related to a multimedia content element respective of a user gesture, comprising: an interface to a network for receiving a uniform resource locator (URL) of a web-page containing a multimedia content element and at least on-image gesture related to the multimedia content element; a processor; and a memory coupled to the processor, the memory contains instructions that when executed by the processor cause the system to: receive, from a user device, the at least on-image gesture and the multimedia content element; analyze the at least on-image gesture to determine at least one portion of the multimedia content element in which a user is interested; generate at least one signature for each of the at least a portion; determine a content item corresponding to the at least one identified portion of multimedia content, wherein the determination is based in part on a type of the at least on-image gesture; and modify the received multimedia content element to include at least a link to an informative resource containing the content item.
 12. The system of claim 11, wherein the system is further configured to: receive at least one event related to the received multimedia content element; and determine the content item corresponding to the at least one identified portion of multimedia content using the generated signatures, the at least one event and on-image gesture.
 13. The system of claim 12, wherein the on-gesture is any one of: a touch gesture, a scroll-over, a mouse click, wherein the touch gesture is detected on a user device having a touch screen display.
 14. The system of claim 12, wherein the event is at least one of viewing the multimedia content element for a specified period of time and interacting with the multimedia content element for a specified period of time.
 15. The system of claim 12, wherein the system is further configured to: determine a type of the on-image gesture and a type of the at least one event; and determine a type of the content item based on at least one of: the type of the on-image gesture and the type of the at least one event.
 16. The system of claim 12, wherein the system is further configured to: determine the context of the multimedia content element respective of the generated signature; and determine the content item based on the context of the multimedia content element respective of the generated signature.
 17. The system of claim 11, wherein any one of the multimedia content elements and the content item is at least one of: an image, graphics, a video stream, a video clip, an audio stream, an audio clip, a video frame, a photograph, images of signals, combinations thereof, and portions thereof.
 18. The system of claim 11, wherein the at least link is added to the multimedia content element as an overlay object, wherein the modified multimedia content is embedded in a web page displayed in a web-browser of the user device.
 19. The system of claim 18, wherein the overlay object comprises a vocabulary of the at least one on-gesture determined as corresponding to the at least identified portion of the received multimedia content element. 